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Abstract 

The Wigner-Dyson-Gaudin-Mehta conjecture asserts that the local eigenvalue statistics of large ran- 
dom matrices exhibit universal behavior depending only on the symmetry class of the matrix ensemble. 
For invariant matrix models, the eigenvalue distributions are given by a log-gas with potential V and 
inverse temperature /3 — 1, 2,4, corresponding to the orthogonal, unitary and symplectic ensembles. For 
P {1, 2,4}, there is no natural random matrix ensemble behind this model, but the statistical physics 
interpretation of the log-gas is still valid for all ft > 0. The universality conjecture for invariant ensembles 
asserts that the local eigenvalue statistics are independent of V . In this article, we review our recent 
solution to the universality conjecture for both invariant and non-invariant ensembles. We will also 
demonstrate that the local ergodicity of the Dyson Brownian motion is the intrinsic mechanism behind 
the universality. Furthermore, we review the solution of Dyson's conjecture on the local relaxation time 
of the Dyson Brownian motion. Related questions such as derealization of eigenvectors and local version 
of Wigner's semicircle law will also be discussed. 
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"Perhaps I am now too courageous when I try to guess the distribution of the distances 
between successive levels (of energies of heavy nuclei). Theoretically, the situation is quite 
simple if one attacks the problem in a simpleminded fashion. The question is simply what are 
the distances of the characteristic values of a symmetric matrix with random coefficients. " 

Eugene Wigner on the Wigner surmise, 1956 
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1 Introduction 



What do the eigenvalues of a typical large matrix look like? Do we expect certain universal patterns 
of eigenvalue statistics to emerge? Although random matrices appeared already in a concrete statistical 
application by Wishart in 1928 [77], these natural questions were not raised until the pioneering work [76] 
of E. Wigner in the 1950's. To make the problem simpler, we restrict ourselves to either real symmetric or 
complex Hermitian matrices so that the eigenvalues are real. For definitcness, we consider N x N square 
matrices H = H^ N ^> = (hij) with matrix elements having mean zero and variance 1/N, i.e., 

Ehij=0, E\hi S \ 2 = jj i,j = 1,2,..., N. (1.1) 

The random variables hij, i,j = 1, . . . , N are real or complex independent random variables subject to the 
symmetry constraint hij = hj\. These ensembles of random matrices are called Wigner matrices. We will 
always consider the limit as the matrix size goes to infinity, i.e., N —> oo. 

The first rigorous result about the spectrum of a random matrix of this type is the famous Wigner semi- 
circle law [76] which states that the empirical densities of the eigenvalues, Ai, A2, • • • , Xn, of large symmetric 
or Hermitian matrices, after proper normalization such as (1.1), are given by 

1 N 1 , 

Qn(x) := - A,) ^ Qsc{x) := — ^(4-^)+ (1.2) 

i=i 

in the weak limit as N — > 00. The limit density is independent of the details of the distribution of hij. The 
motivation for Wigner was to find a phenomcnological model for the energy gap statistics of large atomic 
nuclei since the energy levels of large quantum systems are impossible to compute from first principles. After 
several attempts, Wigner was convinced that random matrices were the right models. Besides the semicircle 
law, he also predicted that the eigenvalue gap distribution in the bulk of the spectrum is given by the Wigner 
surmise, e.g. in the case of symmetric matrices, 

P(^<A,-A,_ X <— )« T ex P (-^)d S) 

where g is the local density of eigenvalues (see [51] for an overview). 

Wigner's proof of the semicircle law was a moment method via computing E Tr H n for each n. The 
Wigner surmise was much more difficult to understand. In the pioneering work by Gaudin [43] the exact gap 
distributions of random matrices with Gaussian distribution for matrix elements were computed in terms 
of a Frcdholm determinant involving Hcrmitc polynomials. Hermitc polynomials were first introduced in 
the context of random matrices by Mehta and Gaudin [53] earlier. Dyson and Mehta [52, 20, 22] have 
later extended this exact calculation to correlation functions and to other symmetry classes. To keep our 
presentation simple, we state the corresponding results in terms of the eigenvalue correlation functions for 
Hermitian N x N matrices. If pj\r(Ai, A2, . . . , \n) denotes the joint probability density of the (unordered) 
eigenvalues, then the n-point correlation functions (marginals) arc defined by 

Pn (Ai,A2, •••,A n ) := / pw(Ai, . . . , A n , A„+i, . . . AAr)dA n+ i . . .dAjv- (1.3) 

In the Gaussian case, the joint probability density of the eigenvalues can be expressed explicitly as 

N 

PN (X 1 ,X 2 ,...,X N ) = const. Y[(X t -Xj) 2 l[e-i N ^ x l (1.4) 

i<j 3 = 1 
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The Vandermonde determinant structure allows one to compute the fc-point correlation functions in the large 
N limit via Hermite polynomials that are the orthogonal polynomials with respect to the Gaussian weight 
function. 

The result of Dyson, Gaudin and Mchta asserts that for any fixed energy E in the bulk of the spectrum, 
i.e., \E\ < 2, the small scale behavior of pffl is given explicitly by 



where K is the celebrated sine kernel 



smTT i - y) 

K{x,y) = — r— . (1.6) 

n{x - y) 

Note that the limit in (1.5) is independent of the energy E as long as it is in the bulk of the spectrum. The 
rescaling by a factor iV -1 of the correlation functions in (1.5) corresponds to the typical distance between 
consecutive eigenvalues and we will refer to the law under such scaling as local statistics. Similar but much 
more complicated formulas for symmetric matrices were also obtained. It is well-known that the eigenvalue 
gap distribution can be computed from the correlation functions via the inclusion-exclusion principle and 
thus (1.5) also yields a precise asymptotics for eigenvalue gap distributions. In a striking coincidence, the 
Wigncr surmise, which was based on a 2 x 2 matrix computation, agrees with this sophisticated formula 
with a typical error of only a few percentage points. Note that the correlation functions do not factorize, 
i.e. the eigenvalues are strongly correlated despite that the matrix elements are independent. Eigenvalues 
of random matrices thus represent a strongly correlated point process obtained from independent random 
variables in a natural way. 

The central thesis of Wigner is the belief that the eigenvalue gap distributions for large complicated 
quantum systems are universal in the sense that they depend only on the symmetry class of the physical 
system but not on other detailed structures. This thesis has never been proved for any truly interacting 
system and there is even no heuristically convincing argument for its correctness. Despite this, there is a 
general belief that the random matrix statistics and Poisson statistics represent two paradigms of energy 
level statistics for many-body quantum systems: Poisson for independent systems and random matrix for 
highly correlated systems. In fact, these paradigms extend even to certain one-body systems such as the 
quantization of the geodesic flow in a domain or on a manifold [7, 9] or random Schrodinger operators [66]. 

In retrospect, Wigner's idea should have received even more attention. For centuries, the primary territory 
of probability theory was to model uncorrelated or weakly correlated systems via the law of large numbers 
or the central limit theorem. Random matrix statistics is essentially the first and only general computable 
pattern for complicated correlated systems and it is conjectured to be ubiquitous. We only mention here the 
spectacular result of Montgomery [54] which proves a special case of the conjecture (under the assumption 
of the Ricmann hypothesis) that the distribution of zeros of the Riemann zeta function on the critical line 
is given by a random matrix statistics. 

The simplest class to test Wigner's universality hypothesis upon is the random matrix ensemble itself. All 
calculations by Dyson, Gaudin and Mchta are for Gaussian ensembles, i.e., where the matrix elements h%j are 
real or complex Gaussian random variables. These ensembles arc called the Gaussian orthogonal ensemble 
(GOE) and Gaussian unitary ensemble (GUE). If Wigner's universality hypothesis is correct, then the local 
eigenvalue statistics should be independent of the law of the matrix elements. This is generally referred 
to as the universality conjecture of random matrices and we will call it the Wigner-Dyson-Gaudin-Mehta 
conjecture due to the vision of Wigner and the pioneering work of these authors. It was first formulated in 
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Mehta's treatise on random matrices [51] in 1967 and has remained a key question in the subject ever since. 
Our goal in this paper is to review the recent progress in this direction and sketch some of the important 
ideas. 

Random matrices have been intensively studied in the last 15-20 years and we will not be able to present 
all aspects of this research. We refer the reader to recent comprehensive books [14, 16, 1]. 

The laws of random matrices can be generally divided into invariant and non-invariant ensembles. The 
invariant ensembles are characterized by a probability measure of the form Z~ 1 e~ NI3TtV ( H ^ 2 dH where N is 
the size of the matrix, V is a real valued potential and Z is the normalization constant. The parameter j3 > 
is determined by the symmetry class of the model and AH is the Lebesgue measure on matrices in the class. 
These ensembles are called invariant since the probability law depends only on the trace of a function of the 
matrix and thus is invariant under changes of coordinates. The matrix elements are in general correlated 
and they arc independent if only if the model is Gaussian, i.e., V is quadratic. 

For invariant ensembles, the probability distribution of the eigenvalues A = (Ai, . . . , Ajv) with Ai < • • • < 
Xn for the measure e~ N ^ TlV< ^ H ^ 2 /Z is given by the explicit formula (c.f. (1.4)) 

N 1 1 

pj$(A)dA ~ e~ fiN:KW dX with Hamiltonian %(X) := ^ -V(\ k ) - — Iog(A 3 - - Aj), (1.7) 

k=l i<i<j<N 

where the parameter /3 is determined by the symmetry class: (3=1 for symmetric matrices, (3 = 2 for 
Hermitian matrices and /3 = 4 for self dual quaternion matrices. The key structural ingredient of this 
formula, the Vandermonde determinant, is the same as in the Gaussian case, (1.4). Thus all previous 
computations, developed for the Gaussian case, can be carried out for j3 = 1,2,4 provided that the Gaussian 
weight function for the orthogonal polynomials is replaced with the function e~P v ( x '' 2 . Thus the analysis of 
the correlation functions depends critically on the the asymptotic properties of the corresponding orthogonal 
polynomials. In the pioneering work of Dyson, Gaudin and Mehta , the potential is the quadratic polynomial 
V(x) = x 2 /2 and the orthogonal polynomials are the Hermitc polynomials whose asymptotic properties are 
well-known. 

The extension of this approach to a general potential is a demanding task; important progress was made 
since the late 1990's by Fokas-Its-Kitaev [42], Blehcr-Its [8], Deift et. al. [14, 17, 18], Pastur-Shchcrbina 
[55, 56] and more recently by Lubinsky [50]. These results concern the simpler (3 = 2 case. For /3 = 1,4, the 
universality was established only quite recently for analytic V with additional assumptions [15, 16, 49, 61] 
using earlier ideas of Widom [75] . The final outcome of these sophisticated analyses is that universality holds 
for the measure (1.7) in the sense that the short scale behavior of the correlation functions is independent of 
the potential V (with appropriate assumptions) provided that (3 is one of the classical values, i.e., /3 <E {1,2,4}, 
that corresponds to an underlying matrix ensemble. 

Notwithstanding matrix ensembles or orthogonal polynomials, the measure (1.7) is perfectly well defined 
for any (3 > and it can be interpreted as the Gibbs measure for a system of particles with a logarithmic 
interaction (log-gas) at inverse temperature (3. It is therefore a natural question to extend universality to 
non-classical j3 but the orthogonal polynomial methods arc difficult to apply for this case. For all (3 > the 
local statistics for the Gaussian case V(x) = x 2 /2 can, however, be characterized by the "Brownian carousel" 
[57, 74] which was derived from a tridiagonal matrix representation [19] of Gaussian random matrices. 

Apart from the invariant ensembles there are many natural non-invariant ensembles; the simplest and 
most important one being the Wigner ensemble for which the matrix elements are independent subject to 
a symmetry requirement, e.g. hij = hji in the Hermitian case. For non-invariant ensembles there is no 
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explicit formula analogous to (1.7) for the joint distribution of the eigenvalues. Hence the methods for the 
invariant ensembles described above are not applicable. Until very recently, most rigorous results have been 
on the density of eigenvalues, i.e. the convergence to the the Wigncr semicircle law (1.2) was established with 
certain error estimates, see e.g. the works by Bai et al [4] and Guionnct and Zcitouni [44]. The universality 
of the local statistics could only be established for Hcrmitian Wigner matrices with a substantial Gaussian 
component by Johansson [46] and Ben Arous-Peche [6] . All previous results on local universality have relied 
on explicitly computable algebraic formulae. These were provided by orthogonal polynomials in case of the 
invariant ensembles, and by a modification of the Harish-Chandra/Itzykson/Zuber integral in case of [46]. 
Nevertheless, following Wigner's thesis, universality is expected to hold for general Wigner matrices as well. 

Having summarized the existing rigorous results that were available until 2008, we set the two main 
problems we wish to address in this article: 

Problem 1: Prove the Wigner-Dyson-Gaudin-Mehta conjecture, i.e. the universality for Wigner matrices 
with a general distribution for the matrix elements. 

Problem 2: Prove the universality of the local statistics for the log-gas (1.7) for all (3 > 0. 

We were able to solve Problem 1 for a very general class of distributions. As for Problem 2, we solved 
it for the case of real analytic potentials V assuming that the equilibrium measure is supported on a single 
interval, which, in particular, holds for any convex potential. We now state our results precisely. 

Theorem 1.1 (Wigner-Dyson-Gaudin-Mehta conjecture) [26, Theorem 7.2] Suppose that H = (hij) 
is a Hermitian (respectively, symmetric) Wigner matrix. Suppose that for some e > 



for some constant C . Let n £ N and O : K" — > M be compactly supported and continuous. Let E satisfy 
—2<E<2 and let £ > 0. Then for any sequence b^ satisfying N~ 1+ ^ < b^ < \\E\ — 2| /2 we have 



Here g sc is the semicircle law defined in (1.2), p^ is the n-point correlation function of the eigenvalue 
distribution of H, and Pq N is the n-point correlation function of an N x N GUE (respectively, GOE) 
matrix. 

We remark that the convergence in this theorem is in weak sense, and it also involves averaging over a 
small energy interval E' G [E — 6jv, E + 6jy]- Stronger types of convergence may also be considered and we 
will comment on one possible such extension in Section 5. We believe that the issue of convergence types is 
of a technical nature and it is dwarfed by the challenge to prove universality for the largest possible family of 
matrix ensembles. The fundamental challenge in random matrix theory remains in answering the question 
of why random matrix law is ubiquitous for seemingly disparate ensembles and physical systems. We will 
present a few extensions in this direction in Sections 8 and 10. 



E y/Nha 



< c, 



(1.8) 




(1.9) 
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In the case of invariant ensembles, it is well-known that for V satisfying certain mild conditions the 
sequence of one-point correlation functions, or densities, associated with /jk N ^ has a limit as -/V — > oo and 
the limiting equilibrium density g(s) can be obtained as the unique minimizer of the functional 



I(v)= / V(t)u(t)dt- / / \og\t - s\v(s)is(t)dtds. 
Jr Jr Jr 

Moreover, for convex V the support of g is a single interval [A, B] and g satisfies the equation 

for any t G (A,B). For the Gaussian case, V(x) = x 2 /2, the equilibrium density is given by the semicircle 
law g = g sc , see (1.2). 

Theorem 1.2 (Bulk universality of /3-ensemble) [10, Corollary 2.2] Assume V is a real analytic func- 
tion with inf-cgR V"(x) > 0. Let /3 > 0. Consider the (3-ensemble \i = (J>py given in (1.7) and let p^' denote 
the n-point correlation functions of fx, defined analogously to (1.3). For the Gaussian case, V(x) = x 2 /2, 
the correlation functions are denoted by Pq n - Let E G (A, B) lie in the interior of the support of g and 
similarly let E' G (—2,2) be inside the support of g sc . Let O : W — > R be a smooth, compactly supported 
function. Then for bpf = N~ 1+ ^ with any < £ < 1/2 we have 

lim o jda 1 ...da n O(a 1 ,...,a n ) J (* + jfifa, • ■ • + jfife) (1-H) 



E'+b N 



E'-br, 



dx 1 („) / ai 



2b N g sc {E'Y P ^ N \ 1 N Qsc (E>y'' ' Ng sc (E>) 



x ■ 



«AE')J 



0. 



i.e. the appropriately normalized correlation functions of the measure /jk fy at the level E in the bulk of 
the limiting density asymptotically coincide with those of the Gaussian case and they are independent of the 
value of E in the bulk. 

We close this introduction with some short remarks concerning these two theorems. Theorem 1.1 holds 
for a much larger class of matrix ensembles with independent entries and we will review some of them in 
Sections 8 and 10. Although Theorem 1.1 in its current form was proved in [26], the key ideas have been 
developed through several important steps in [27, 33, 36, 37, 38]. In particular, the Wigner-Dyson-Gaudin- 
Mehta (WDGM) conjecture for Hermitian matrices was first solved in [27] in a joint work with the current 
authors and Peche, Ramirez and Schlein. This result holds whenever the distributions of matrix elements are 
smooth. The smoothness requirement was partially removed in [67] and completely removed in a joint paper 
with Ramirez, Schlein, Tao and Vu [28]. The WDGM conjecture for symmetric matrices was resolved in [33]. 
In this paper, a novel idea based on Dyson Brownian motion was discovered. The most difficult case, the 
real symmetric Bernoulli matrices, was solved in [37] where a "Fluctation Averaging Lemma" (Lemma 3.4 
of the current paper) exploiting cancellation of matrix elements of the Green function was first introduced. 
We will give a more detailed historical review in Section 11. 

The proof of Theorem 1.1 consists of the following three steps, discussed in Sections 3, 2 and 4, respectively. 
Our three-step strategy was first introduced in [27]. 
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Step 1. Local semicircle law and derealization of eigenvectors: It states that the density of eigenvalues is 
given by the semicircle law not only as a weak limit on macroscopic scales (1.2), but also in a strong sense and 
down to short scales containing only N E eigenvalues for all e > 0. This will imply the rigidity of eigenvalues, 
i.e., that the eigenvalues are near their classical location in the sense to be made clear in Section 2. We also 
obtain precise estimates on the matrix elements of the Green function which in particular imply complete 
delocalization of eigenvectors. 

Step 2. universality for Gaussian divisible ensembles: The Gaussian divisible ensembles are matrices of the 
form H t = e-t^Ho + s/1 - e^U, where H is a Wigner matrix and U is an independent GUE matrix. 
The parametrization of H t reflects that it is most conveniently obtained by an Ornstein-Uhlenbeck process. 
There are two methods and both methods imply the bulk universality of H t for t = N~ T for the entire range 
of < t < 1 with different estimates. 

2a Proposition 3.1 of [27] which uses an extension of Johansson's formula [46]. 
2b Local ergodicity of the Dyson Brownian motion (DBM): 

The approach in 2a yields a slightly stronger estimate than the approach in 2b, but it works only in the 
Hermitian case. In this review, we will focus on the Dyson Brownian approach. 

Step 3. Approximation by Gaussian divisible ensembles: It is a simple density argument in the space of 
matrix ensembles which shows that for any probability distribution of the matrix elements there exists a 
Gaussian divisible distribution with a small Gaussian component, as in Step 2, such that the two associated 
Wigner ensembles have asymptotically identical local eigenvalue statistics. The first implementation of this 
approximation scheme was via a reverse heat flow argument [27]; it was later replaced by the Green function 
comparison theorem [36]. 

The proof of Theorem 1.2 consists of the following two steps that will be presented in Sections 6 and 7. 

Step 1. Rigidity of eigenvalues. This establishes that the location of the eigenvalues are not too far from 
their classical locations determined by the equilibrium density g(s). 

Step 2. Uniqueness of local Gibbs measures with logarithmic interactions. With the precision of eigenvalue 
location estimates from the Step 1 as an input, the eigenvalue spacing distributions are shown to be given by 
the corresponding Gaussian ones. (We will take the uniqueness of the spacing distributions as our definition 
of the uniqueness of Gibbs state.) 

There are several similarities and differences between these two methods. Both start with rigidity esti- 
mates on eigenvalues and then establish that the local spacing distributions are the same as in the Gaussian 
cases. The Gaussian divisible ensembles, which play a key role in our theory for noninvariant ensembles, are 
completely absent for invariant ensembles. The key connection between the two methods, however, is the 
usage of DBM (or its analogue) in the Steps 2. In Section 2, we will first present this idea. 

The method for the proof of Theorem 1.1 is extremely general. As of this writing, it has been applied 
to the generalized Wigner ensembles, the sample covariance ensembles and the Erdos-Renyi matrices for 
certain range of the sparseness parameter. It can also be extended to the edges of the spectrum, and it 
yields edge universality under more general conditions than were previously known. This will be reviewed 
in Section 9. Extensions to generalized Wigner matrices and Erdos-Renyi matrices will also be discussed in 
Sections 8 and 10. As the proof of Theorem 1.2 was just completed, we do not know how far this method can 
reach; currently we can generalize the result to the nonconvex case under the assumption that the equilibrium 
measure p is supported on a single interval [11]. The theory we have developed to prove Theorems 1.1 and 1.2 
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is purely analytic and we believe that it unveils the genuine mechanism of the Wigner-Dyson-Gaudin-Mehta 
universality. Finally, a short summary concerning the recent history of universality is given in Section 1 1 . 

Acknowledgement. The results in this review were obtained in collaboration with Benjamin Schlcin, Jun 
Yin, Antti Knowles and Paul Bourgade and in some work, also with Jose Ramirez and Sandrine Peche. This 
article is to report the joint progress with these authors. 

2 Dyson Brownian motion and the local relaxation flow 

2.1 Concept and results 

The Dyson Brownian motion (DBM) describes the evolution of the eigenvalues of a Wigncr matrix as an 
interacting point process if each matrix element h-ij evolves according to independent (up to symmetry 
restriction) Brownian motions. We will slightly alter this definition by generating the dynamics of the 
matrix elements by an Ornstein-Uhlenbeck (OU) process which leaves the standard Gaussian distribution 
invariant. In the Hermitian case, the OU process for the rescaled matrix elements Vij := N x / 2 hij is given by 
the stochastic differential equation 

dvij = d/% - ^Vijdt, i,j = 1, 2, . . . N, (2.1) 

where flij, i < j, are independent complex Brownian motions with variance one and flu are real Brownian 
motions of the same variance. Denote the distribution of the eigenvalues A = (Ai, A2, • • • , Ajv) of H t at time 
t by / t (A)/zc(dA) where \ig is given by (1.7) with the potential V{x) = x 2 /2. 
Then f t = f tN satisfies [21] 

d t ft=J?fu (2.2) 

where 

i=l i=l \ j^ti J / 

The parameter (3 is chosen as follows: j3 ~ - 2 for complex Hermitian matrices and j3 — 1 for symmetric 
real matrices. Our formulation of the problem has already taken into account Dyson's observation that the 
invariant measure for this dynamics is [Iq. A natural question regarding the DBM is how fast the dynamics 
reaches equilibrium. Dyson had already posed this question in 1962: 

Dyson's conjecture [21]: The global equilibrium of DBM is reached in time of order one and the local 
equilibrium (in the bulk) is reached in time of order 1/A^. Dyson further remarked, 

"The picture of the gas coming into equilibrium in two well-separated stages, with microscopic and 
macroscopic time scales, is suggested with the help of physical intuition. A rigorous proof that this 
picture is accurate would require a much deeper mathematical analysis. " 

We will prove that Dyson's conjecture is correct if the initial data of the flow is a Wigner ensemble, which 
was Dyson's original interest. Our result in fact is valid for DBM with much more general initial data that 
we now survey. Briefly, it will turn out that the global equilibrium is indeed reached within a time of order 
one, but local equilibrium is achieved much faster if an a-priori estimate on the location of the eigenvalues 
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(also called points) is satisfied. To formulate this estimate, let jj = jj ^ denote the location of the j-th 
point under the semicircle law, i.e., jj is defined by 

N g sc (x)dx =j, l<j< N. (2.4) 



We will call jj the classical location of the j-th point . 
A-priori Estimate: There exists an a > such that 



i r N 

~Tr / £(Aj ~ 7,) 2 / t (A)/. G (dA) < CN-^« (2.5) 



N 

1 / 

sup 

t>JV- 2 ' 



with a constant C uniformly in N. (This a-priori estimate was referred to as Assumption III in [33, 34].) 

The main result on the local ergodicity of Dyson Brownian motion states that if the a-priori estimate 
(2.5) is satisfied then the local correlation functions of the measure ft^G are the same as the corresponding 
ones for the Gaussian measure, fie — ZooMc provided that t is larger than N~ 2a . The n-point correlation 
functions of the probability measure ftd/ic ar e defined, similarly to (1.3), by 

p\ r %(x 1 ,x 2 , ■ ■ ■ ,x n ) = / /t(x)/x G (x)da; n+1 . . .dx N , x = (xi,x 2 , ■ ■ ■ ,x N ). (2.6) 



Due to the convention that one can view the locations of eigenvalues as the coordinates of particles, we have 
used x, instead of A, in the last equation. From now on, we will use both conventions depending on which 
viewpoint we wish to emphasize. Notice that the probability distribution of the eigenvalues at the time t, 
ft^G: is the same as that of the Gaussian divisible matrix: 

H t = e-*' 2 H + (1 - e-') 1/2 U, (2.7) 

where Hq is the initial Wigncr matrix and U is an independent standard GUE (or GOE) matrix. This 
establishes the universality of the Gaussian divisible ensembles. The precise statement is the following 
theorem: 

Theorem 2.1 [34, Theorem 2.1] Suppose that the a-priori estimate (2.5) holds for the solution f t of the 
forward equation (2.2) with some exponent a > 0. Let E G (—2, 2) and b > such that [E—b, E+b] C (—2, 2). 
Then for any s > 0, for any integer n > 1 and for any compactly supported continuous test function 
O : E" -> R, we have 



pE+b i^v 

lim sup / —— / d«i . . . da n 0{a\, 

N-^OO t>N ~2a + s J E _ b 2b 



(2.8) 



We can choose b = b^ depending on N. In [34] explicit bounds on the speed of convergence and the 
optimal range of b were also established. In particular, thanks to the optimal rigidity estimate [38], i.e., (2.5) 
with o = 1/2, the range of the energy averaging in (2.8) was reduced to frjv > 7V~ 1+, % £ > 0, but only for 
t > iV-«/ 8 (Theorem 2.3 of [38]). 

Theorem 2.1 is a consequence of the following theorem which identifies the gap distribution of the eigen- 
values. 
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Theorem 2.2 (Universality of the Dyson Brownian motion for short time) [34, Theorem 4-1] 
Suppose /3 > 1 and let G : M — > M be a smooth function with compact support. Then for any sufficiently 
small e > 0, independent of N, there exist constants C, c > 0, depending only on e and G such that for any 
J C {1,2, N - 1} we have 



J ^Y.GiNix^x^ftd^c- J ±Y,G(N( Xi -x i+1 ))dvL G \ <CN^^ + Ce- cN \ (2.9) 

In particular, if the a-priori estimate (2.5) holds with some a > and \J\ is of order N, then for any 
t > jV~ 2a+3£ the right hand side converges to zero as N — > oo, i.e. the gap distributions for ftdfic and dfiG 
coincide. 

The test functions can be generalized to 

G^N(x l - x i+1 ),N(x i+1 - x i+2 ), ■ ■ .,N(x i+n -i - x i+n fj (2-10) 

for any n fixed which is needed to identify higher order correlation functions. In applications, J is chosen 
to be the indices of the eigenvalues in the interval [E — b, E + b] and thus |J| ~ Nb. This identifies the 
gap distributions of eigenvalues completely and thus also identifies the correlation functions and concludes 
Theorem 2.1. Note that the input of this theorem, the apriori estimate (2.5), identifies the location of 
the eigenvalues only on a scale N~ x / 2 ~ a which is much weaker than the l/N precision for the eigenvalue 
differences in (2.9). 

By the rigidity estimates (see Corollary 3.2 below), the a-priori estimate (2.5) holds for any a < 1/2 if 
the initial data of the DBM is a Wigner ensemble. Therefore, Theorem 2.2 holds for any t > N~ 1+£ for any 
e > and this establishes Dyson's conjecture. 

2.2 Main ideas behind the proof of Theorem 2.2 

The key method is to analyze the relaxation to equilibrium of the dynamics (2.2). This approach was first 
introduced in Section 5.1 of [33]; the presentation here follows [34]. 

We start with a short review of the logarithmic Sobolev inequality for a general measure. Let the 
probability measure fi on be given by a general Hamiltonian J£: 

-WM(x) 

d/i(x) = - dx, (2.11) 

and let Jzf be the generator, symmetric with respect to the measure d/i, defined by the associated Dirichlet 
form 

D(f) = DM) = - J /JSf/dp ~ 5^ E / (^/) 2d M, 0j = d X] . (2.12) 
Recall the relative entropy of two probability measures: 

If dv = fdfj,, then we will sometimes use the notation S^(f) := S(ffi\/j,). The entropy can be used to control 
the total variation norm via the well known inequality 

J |/ - l|d/i < yj2S„{f). (2.13) 
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Let ft be the solution to the evolution equation 

d t ft = J? f u t > 0, (2.14) 
with a given initial condition fa. The evolution of the entropy S^ft) = 5(/t/x|/x) satisfies 

dtS^ft) = -AD^yfft). (2.15) 
By Bakry and Emery [5], the evolution of the Dirichlct form satisfies the inequality 

<~J (VV7 t )(V 2 M)vy/ t d M . (2.16) 

If the Hamiltonian is convex, i.e., 

V 2 Jf(x) = HessW(x) > d forallxeM^ (2.17) 
with some constant •§ > 0, then we have 

dtD^yfJt) < -QD^yfft). (2.18) 

Integrating (2.15) and (2.18) back from infinity to 0, we obtain the logarithmic Sobolev inequality (LSI) 

S,x(/) < ^(V7), f = fo (2-19) 
and the exponential relaxation of the entropy and Dirichlet form on time scale t ~ 1/d 

< e-^SM, D M (y/M<e-'*D^y/fo). (2.20) 

As a consequence of the logarithmic Sobolev inequality, we also have the concentration inequality for any k 
and a > 

J 1 (\x k - E^(x k )\ > a) d/i < 2e-* Na2 t 2 . (2.21) 

We will not use this inequality in this section, but it will become important in Section 6. 

Returning to the classical ensembles, we assume from now on that !H is given by (1.7) with V(x) = x 2 /2 
and the equilibrium measure is the Gaussian one, /i = fiQ. We then have the convexity inequality 

(v,V* M (x)v)>>|» + l£|^>i| H |», , £ «" (2.22, 

This guarantees that y, satisfies the LSI with § = 1/2 and the relaxation time to equilibrium is of order one. 

The key idea is that the relaxation time is in fact much shorter than order one for local observables that 
depend only on the eigenvalue differences. Equation (2.22) shows that the relaxation in the direction vi — Vj 
is much faster than order one provided that close. However, this effect is hard to exploit directly 

due to that all modes of different wavelengths are coupled. Our idea is to add an auxiliary strongly convex 
potential W(x) to the Hamiltonian to "speed up" the convergence to local equilibrium. On the other hand, 
we will also show that the cost of this speeding up can be effectively controlled if the a-priori estimate (2.5) 
holds. 
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The auxiliary potential W(x) is denned by 

N 1 
W(x) := Wjfa), Wj(x) := —( Xj - 7 ,) 2 , (2.23) 

i=i T 

i.e. it is a quadratic confinement on scale ypr for each eigenvalue near its classical location, where the 
parameter r > will be chosen later. The total Hamiltonian is given by 

'K:='K + W 1 (2.24) 

where Jt is the Gaussian Hamiltonian given by (1.7). The measure with Hamiltonian !K, 

duj := w(x)dx, lu := e~ N5i /Z, 

will be called the ZocaZ relaxation measure. This measure was named the pseudo- equilibrium measure in our 
previous papers. 

The local relaxation flow is defined to be the flow with the generator characterized by the natural Dirichlet 
form w.r.t. w, explicitly, «Sf : 

2 = h i d i> b i = W 'A x o) = Xj ~ 7 ' • ( 2 - 25 ) 

We will typically choose r <C 1 so that the additional term W substantially increases the lower bound (2.17) 
on the Hessian, hence speeding up the dynamics so that the relaxation time is at most r. 

The idea of adding an artificial potential W to speed up the convergence appears to be unnatural here. 
The current formulation is a streamlined version of a much more complicated approach that appeared in 
[33] and which took ideas from the earlier work [29]. Roughly speaking, in hydrodynamical limit, the 
short wavelength modes always have shorter relaxation times than the long wavelength modes. A direct 
implementation of this idea is extremely complicated due to the logarithmic interaction that couples short 
and long wavelength modes. Adding a strongly convex auxiliary potential W(x) shortens the relaxation time 
of the long wavelength modes, but it does not affect the short modes, i.e. the local statistics, which are our 
main interest. The analysis of the new system is much simpler since now the relaxation is faster, uniform for 
all modes. Finally, we need to compare the local statistics of the original system with those of the modified 
one. It turns out that the difference is governed by (VW) 2 which can be directly controlled by the a-priori 
estimate (2.5). 

Our method for enhancing the convexity of J£ is reminiscent of a standard convexification idea concerning 
metastable states. To explain the similarity, consider a particle near one of the local minima of a double well 
potential separated by a local maximum, or energy barrier. Although the potential is not convex globally, 
one may still study a reference problem defined by convexifying the potential along with the well in which the 
particle initially resides. Before the particle reaches the energy barrier, there is no difference between these 
two problems. Thus questions concerning time scales shorter than the typical escape time can be conveniently 
answered by considering the convexified problem; in particular the escape time in the metastability problem 
itself can be estimated by using convex analysis. Our DBM problem is already convex, but not sufficiently 
convex. The modification by adding W enhances convexity without altering the local statistics. This is 
similar to the convexification in the metastability problem which does not alter events before the escape 
time. 
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2.3 Some details on the proof of Theorem 2.2 

The core of the proof is divided into three theorems. For the flow with generator Jzf , we have the following 
estimates on the entropy and Dirichlet form. 

Theorem 2.3 Consider the forward equation 

d t q t = J?q t , t>0, (2.26) 

with initial condition qo = q and with the reversible measure w. Assume that J qodui = 1. Then we have the 
following estimates 



AT 

2N 2 

and the logarithmic Sobolev inequality 



1 f, /f i.d iy /q a -d iy /^)\ 

— 1 J 



(2.28) 



<CrZ? w (v^) (2.29) 
to^/i a universal constant C . Thus the relaxation time to equilibrium is of order t: 

SM<e~ Ct/T SJq). (2.30) 

Proof. Denote by h = yfq and we have the equation 

d t D w {h t ) = d t ^ J (V/ l ) 2 e- Jv5t dx < y V/ l (V 2 M)V/ ie - A,S dx. (2.31) 

In our case, (2.22) and (2.23) imply that the Hessian of "K is bounded from below as 

vmv 2 J£)v/, > £ £(^) 2 + ^ ^ : ) . ~ 9 ^) 2 ( 2 - 32 ) 

with some positive constant C. This proves (2.27) and (2.28). The rest can be proved by straightforward 
arguments given in the earlier part of this section. rj 

The estimate (2.28) plays a key role in the next theorem. 

Theorem 2.4 (Dirichlet form inequality) Let q be a probability density J qdu = 1 and let G : R — > R 

be a smooth function with compact support. Then for any J C {1, 2, . . . , N — 1} and any t > we have 

(2.33) 
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Proof. For simplicity, we assume that J = {1,2, . . . , N — 1}. Let q t satisfy 

d t q t = JZqt, t > 0, 

with an initial condition q$ = q. We write 

r l 



= / [|7| E ~ - 3t)d" + y [-L g G(JV(a* - z i+1 ))J ( gt - l)dw. (2.34) 

The second term can be estimated by (2.13), the decay of the entropy (2.30) and the boundedness of G; this 
gives the second term in (2.33). 

To estimate the first term in (2.34), by the evolution equation dqt = ^£qt and the definition of Jz? : 

J jjj- ^2G{N(xi - x i+1 ))q t du) - J E G(N(x l - x i+1 ))q du) 



eJ 



= [ ds I jtiE G '( N ( x i ~ x i+1 ))[diq s - d l+1 q s ]du. 
Jo J \J\ ieJ 



From the Schwarz inequality and dq = 2^/qd^/q, the last term is bounded by 



Xi - x i+ i)) (xi - x l+1 ) 2 q s dw 



< C 



JO JR N \J\ ~^j V 

; / ds / 1^2 E 7 " ^[diy/ql ~ d l+ly /qZ} 2 du 

Jo Jr n n i {Xi ~ Xi+lY 

V IJI J ' 



1/2 



1/2 



(2.35) 



where we have used (2.28) and that G'(N(xi — Xi+i)) (x, — Xi + i) 2 < CN 2 due to G being smooth and 



compactly supported. 



□ 



Alternatively, we could have directly estimated the left hand side of (2.33) by using the total variation 
norm between qu and lo, which in turn could be estimated by the entropy (2.13) and the Dirichlet form 
using the logarithmic Sobolev inequality, i.e., by 



C j\q- l|dw < C^Sjq) < Cy/rDuiy/q). 



(2.36) 



However, compared with this simple bound, the estimate (2.33) gains an extra factor |J| ~ N in the 
denominator, i.e. it is in terms of Dirichlet form per particle. The improvement is due to the observable in 
(2.33) being of special form and we exploit the term (2.28). 



The final ingredient in proving Theorem 2.2 is the following entropy and Dirichlet form estimates. 
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Theorem 2.5 Suppose that (2.22) holds. Let a > be fixed and recall the definition of Q = Q a from (2.5). 
Fix a constant r > N~ 2a and consider the local relaxation measure ui with this r. Set ip := u/fj, and let 
9t '■= ft/i->- Suppose there is a constant m such that 

S(fMuj) < CN m . (2.37) 

Then for any t > tN 6 the entropy and the Dirichlet form satisfy the estimates: 

S(gMcj) < CN 2 Qt-\ D u (V9i) < CN 2 Qt- 2 (2.38) 

where the constants depend on e and m. 

Proof. The evolution of the entropy S(ftfJ>\u>) = S u (gt) can be computed explicitly by the formula [78] 

Hence we have, by using (2.25), 

9tS(f t n\u) = — J2 / ( d iV9d 2 + I &9t du, + ^ J b 3 d ]9t doj. 
j j 

Since u> is Jzf-invariant and time independent, the middle term on the right hand side vanishes, and from 
the Schwarz inequality 

dtS(fMu) < -D u {y/gi) + CNJ2J tfst du < -D u (V9~t) + CN 2 Qt~ 2 . (2.39) 

3 

Together with the logarithmic Sobolev inequality (2.29), we have 

d t S(fttx\u) < -D u {^g7) + CN 2 Qt- 2 < -Ct' 1 S(f t n\u) + CN 2 Qt~ 2 . (2.40) 

Integrating the last inequality from r to t and using the assumption (2.37) and t > tN e , we have proved the 
first inequality of (2.38). Using this result and integrating (2.39), we have 



j D u {^g- S )ds < CN 2 Qt-\ 



By the convexity of the Hamiltonian, D^yfft) is decreasing in t. Since D L0 {^/gl) < CD^yffl) + CN 2 Qt 2 , 
this proves the second inequality of (2.38). rj 

Finally, we complete the proof of Theorem 2.2. For any given t > we now choose r := tN~ E and we 
construct the local relaxation measure u with this r. Set ij) = uj/fi and let q := gt — ft/ip be the density q in 
Theorem 2.4. Then Theorem 2.5, Theorem 2.4 and an easy bound on the entropy S u (q) < CN m imply that 



(2.41) 



i.e., the local statistics of f t \i and u are the same for any initial data f T for which (2.37) is satisfied. Applying 
the same argument to the Gaussian initial data, /o = /r = 1, we can also compare fj, and lo. We have thus 
proved (2.9) and hence the universality. rj 
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3 Local semicircle law via Green function 

The Wigner semicircle law asserts that (1.2) is valid in a weak limit, i.e., for any smooth test function O 
with compact support we have 

E 0(x)[g N (x) - Q sc (x)]dx ->Q. (3.1) 



This means that the density of eigenvalues in a window independent of N is given by the semicircle law. Our 
goal is to prove a local version of this result for windows slightly larger than 1 /N and in a large deviation 
sense. The main object to study is the Green function of the matrix G(z) = [H — z]~ l . z = E + irj, E £ R, 
rj > 0, which is related to the Stieltjes transform of the empirical measure: 



m(z) = mN{z) ; = 1„_L__ 1 g_L__ jJS&. ^g G ,.(,). (3. 2 ) 

We will compare it with m sc (z) := J R (x — z)^ 1 g sc (x)dx, the Stieltjes transform of the semicircle law. This is 
the content of the local semicircle law, Theorem 3.1 below. The key parameter is n = 3m z which determines 
the resolution, i.e. the scale on which the local semicircle law holds. 

For the rest of this paper, we will assume that the probability distribution of the matrix elements satisfy 
the following subexponential condition: 

P(KI > x) < C cxp( ~x^), x>0, (3.3) 



with some positive constants Co, I?, where we set Vij = yNhij. This condition can be relaxed to (1.8) via a 
cutoff argument, but we will not discuss such technical details here. 

Theorem 3.1 (Local semicircle law) [38, Theorem 2.1] Let H = (hij) be a Hermitian or symmetric 
N x N random matrix with E/ijj = 0, 1 < i,j < N. Suppose that the distributions of the matrix elements 
have a uniformly subexponential decay (3.3). Then there exist positive constants Aq > 1, C, c and < 1 
such that with 

L := Aq log log TV (3.4) 

the following estimates hold for any sufficiently large N > Nq(Cq, 

(i) The Stieltjes transform of the empirical eigenvalue distribution of H satisfies 

P( |J {\m(z) - m 8C {z)\ > {l ° E *f L }) < Cexp [ - cQogN)* L ] , (3.5) 

where 



S L := {z = E + in : \E\ < 5, A^- 1 (logiV) 10L < 77 < lo}. (3.6) 
(ii) The individual matrix elements of the Green function satisfy 



|J I max \ Gij (z) - 6 tJ m sc (z)\ > (logiV) 4 ^ + \ j < Cexp [ - c(log N)^] . 



(3.7) 
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Theorem 3.1 is the strongest form of the local semicircle law that gives optimal error estimates (modulo 
logarithmic factors) on the smallest possible scale, which is valid uniformly in the spectrum including the 
edge, and which controls not only the Sticltjcs transform but also individual matrix elements of the resolvent. 
This theorem is the final result of subsequent improvements in [31, 32, 36, 37, 38] of our first local semicirle 
law in [30]. 

The local semicircle estimates imply that the j-th eigenvalue, Xj, is very close to its classical location jj, 
defined in (2.4): 

Corollary 3.2 (Rigidity of eigenvalues) [38, Theorem 2.2] Under the assumptions of Theorem 3.1 we 
have 

pjaj : \\j -7j I > (\ogN) L min(j,N- j + l) 1/3 A^~ 2/3 | < Ccxp [ - cilogN)^] (3.8) 

for any sufficiently large N > No- 

This corollary in particular proves the a-priori estimate (2.5) for any a < 1/2. 

Corollary 3.2 is a simple consequence of the Hclffcr-Sjostrand formula which translates information on 
the Sticltjcs transform of the empirical measure first to the counting function and then to the locations of 
eigenvalues. The formula yields the representation 

Hx) = -1 / M^±Md.d, = -1 / ^M±»MWM^ (3 . 9) 

27r J R 2 X — x — ly 2n J M 2 X — x — ly 

for any real valued C 2 function / on R, where x(v) is an y smooth cutoff function with bounded derivatives 
and supported in [—1, 1] with x(y) = 1 f° r \v\ — 1/2- In the applications, / will be a smoothed version of 
the characteristic functions of spectral intervals so that J^ - f(Xj) counts eigenvalues in that interval. The 
details of the argument can be found in [29] . 

We also mention that Theorem 3.1 immediately implies complete derealization of each eigenvector of 
the Wigner matrix: 

Corollary 3.3 (Complete delocalization) Let Ux,U2, ■ ■ ■ be the I 2 -normalized eigenvectors of H . Under 
the assumptions of Theorem 3.1 we have 

pja/3 : > (1 ° g y 10L | < Ccxp [ - c(logJV)* £ ] (3.10) 

for any sufficiently large N > No. 

For the proof, notice that (3.7) implies the bound = 0(1) with very high probability for any z £ Sl- 

Therefore, 

C > JmG n (X a + vr,) = E (a ^_ Aq)2 + ??2 > —j— 

The original proof of delocalization of eigenvectors was derived from the Sticltjcs transform of the empirical 
measure [30, 32], motivated by a question posed by T. Spencer. 
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Sketch of the proof of Theorem 3.1. For simplicity, we will assume here that E = 9\c z is away from the 

spectral edges. The starting point is the following well known formula. Let A, B, C be n x n, m x n and 
m x m matrices and set 

D:=(£ £). (3.11) 

Then for any 1 < i,j < n, we have 

(£-%• = [(A-^C- 1 ^)- 1 ]... (3.12) 
Applying this formula to the resolvent matrix G = (H — z)~\ we have 

Ga — — ^ — = , (3.13) 

ha — z— J2k,i^i hikG kl h H hii — z — Ej J]fe,/^i ^ifeCfci _ ^ 

where 

2i := h ik G$h u - Eih ik G$h u . (3.14) 

k : l^i k,l^i 

Here G^ denotes the resolvent of the (TV — 1) x (TV — 1) minor of H after removing the i-th row and 
column and Ej denotes the expectation with respect to the entries in the i-th row and column. Since G« is 
independent of h%k and ^ihikhu = jjSki, we have 



E >* = 4 E ^2 = £ E G - + 



TVf-^ 7V^ VTV 

k,l^i k^ti k 

Here we used the interlacing property of eigenvalues between a matrix and its minors, which implies that 



— TrG- — TrG (l 
TV TV 



Dehning Vi := G« — m sc , we thus have 



G 

= \m(z)-m {l) (z)\ < , r] = 3mz>0. (3.15) 

TV 7/ 



«i = Gjj - m sc = -. r- - m sc . (3.16) 

jfEjVj + Zi-hu + OiN-i) 

Expanding the denominator, using the identity m sc (z) + \m sc {z) + z]^ 1 =0 and neglecting the error terms 
ha + OiN- 1 ) = OiN- 1 / 2 ), we have 



Vi = ml. 



{jf E v i + z ) + m *c (jf E v i + z *) 2 + • • • ( 3 - 17 ) 



J 3 

Summing up i and dividing by TV, we obtain, modulo negligible errors, 

M := ^ E »i « ra scH + m 3 8C [v} 2 + m 2 sc [Z] +o(^E 1^1* J ' t Z l : = ^ E ^ 



(3.18) 
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To estimate Zi, we compute its second moment 



E\Z, 



= ^E 



E 

fe,i^i k' 



hkG^hu-EihikG^hn 



hik'G\} v hv 



(3.19) 



Since Eft, = 0, the non-zero contributions to this sum come from index combinations when all h and ft, are 
paired. For pedagogical simplicity, assume that Eft 2 = 0, this can be achieved, for example, if the distribution 
of the real and imaginary parts are the same. Then the ft factors in the above expression have to be paired 
in such a way that hik — h%y and hu = hu>, i.e., k = k', I = I'. Note that pairing hik ~ hu would give zero 
because the expectation is subtracted. The result is 

1 ^EK^I 2 , (3.20) 



EAZ; 



■Jp E I 2 

k,l^i 



N 2 



where 777,4 = E| \/Nh\ 4 is the fourth moment of the single entry distribution. The first term can be computed 



1 



^2 

k,l=^i 



1 



1 1 



^E(i G(l) i 2 )^ = 

k^i 1 



(0 

kk 



Nrj' 



-3m 771 



(3.21) 



The second term in (3.20) can be estimated by a similar bound. These estimates confirm that the size of Z i: 
at least in the second moment sense, is roughly 

\Zi\ < -£U. (3.22) 



/Nrj 

Neglecting the [v] 2 term in (3.18) and using that |1 — m 2 c \ > c away from the spectral edge for some positive 
c, we thus have |»n(;z) — m sc (z)\ < C(Nri)^ 1 / 2 . A similar but more involved argument gives the same bound 
for individual Vi's, showing the estimate (3.7) for the diagonal elements Gu. The estimate for the off-diagonal 
terms, Gy, i 7^ j, is obtained from the identity = GjjG^p [Z^j — hy] which can be proved using (3.12). 
Here Z^ is defined analogously to (3.14) as 

Z ij '■= E h ^ G M h U~ E 



EijhikG^hij, 



where G^ is the resolvent of the (N — 2) x (N — 2) minor of H after removing the i-th and j-th row and 
column. The bound (3.22) holds for Zij as well. 

The estimate for [v] = m — m sc , the average of Vi's, is of order (A?7) _1 in (3.5), i.e. it is better than the 
(Nrj) -1 / 2 estimate for the individual matrix elements in (3.7). The key mechanism for this improvement is 
the cancellation of the Zj's in their average [Z]. If Zj's were independent, we would gain a factor TV -1 / 2 by 
the central limit theorem. But Zj's are correlated and the cancellation takes the following form: 



Lemma 3.4 (Fluctuation Averaging Lemma) With the notations of Theorem 3.1, for any e > we 

have 

N 

E 




Zi 



> 



A?? 



<Gexp[-c(logA)^] 



(3.23) 



for sufficiently large N . 



Using this lemma and (3.18), we have proved the stronger estimate for [v). This completes the sketch of 



the proof of the local semicircle law, Theorem 3.1. 



□ 
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4 The Green function comparison theorems 



We now state the Green function comparison theorem, Theorem 4.1. It will quickly lead to Theorem 4.2 
stating that the correlation functions of eigenvalues of two matrix ensembles arc identical on a scale smaller 
than 1 /N provided that the first four moments of all matrix elements of these two ensembles are almost the 
same. We will state a limited version for real Wigner matrices for simplicity of presentation. 

Theorem 4.1 (Green function comparison) [36, Theorem 2.3] Suppose that we have two NxN Wigner 
matrices, H^ v > and H^ w \ with matrix elements hij given by the random variables N~ 1 / 2 Vij and N~ x / 2 Wij , 
respectively, with Vij and Wij satisfying the uniform subexponential decay condition (3.3). We assume that 
the first four moments of and are close to each other in the sense that 



< N S-2+s/2^ 1 < S < 4, (4.1) 



holds for some 8 > 0. Then there are positive constants C\ and e, depending on $ and Cq from (3.3) such 
that for any r\ with 7V _1_e < r\ < N^ 1 and for any z\, z 2 with 3m Zj = irj, 3 = 1, 2, we have 



lim 

N-KX 



]TrG (v) (z 1 )TrG {v) (z 2 ) -ETrG iw) (z 1 )TrG (w) (z 2 ) = 0, (4.2) 
where G^ and G^ denotes the Green functions of and . 



The matching condition (4.1) is essentially the same as the one appeared in [67]. Here we formulated 
Theorem 4.1 for a product of two traces of the Green function, but the result holds for a large class of 
smooth functions depending on several individual matrix elements of the Green functions as well, see [36] 
for the precise statement. (The matching condition (4.1) is slightly weaker than in [36], but the proof in [36] 
without any change yields this slightly stronger version.) This general version of Theorem 4.1 implies the 
correlation functions of the two ensembles at the scale 1/N are identical: 

Theorem 4.2 (Correlation function comparison) [36, Theorem 6.4] Suppose the assumptions of The- 
orem 4-1 hold. Let p^lf and p^ N be the n— point functions of the eigenvalues w.r.t. the probability law of 
the matrix H^ v ' and H^ w ', respectively. Then for any \E\ < 2, any n > 1 and any compactly supported 
continuous test function O : R™ -)Kfe have 

da,... da n OK ,...,a n ) (pg, - p%) (e + ^, . . . , E + ^) = 0. (4.3) 

The basic idea for proving Theorem 4.1 is similar to Lindeberg's proof of the central limit theorem, where 
the random variables are replaced one by one with a Gaussian one. We will replace the matrix elements Vij 
with one by one and estimate the effect of this change on the resolvent by a resolvent expansion. The 
idea of applying Lindeberg's method in random matrices was recently used by Chatterjee [13] for comparing 
the traces of the Green functions; the idea was also used by Tao and Vu [67] in the context of comparing 
individual eigenvalue distributions. There are two main differences between our method and the one that 
appeared in [67]: 

(i) We compare the statistics of eigenvalues of two different ensembles near fixed energies while [67] 
compared the statistics of the 31,32, ■ ■ ■ Jfc-th eigenvalues for fixed labels 31,32, ■ ■ - jk- 

(ii) There is a serious difficulty in the approach [67] concerning possible resonances of neighboring eigen- 
values that may render the expansion unstable. 
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The Green function method eliminates this difficulty completely and Theorem 4.1 is a simple corollary of 
the Green function estimate Theorem 3.1. 

For a sketch of the proof, fix a bijective ordering map on the index set of the independent matrix elements, 

N(N+ 1) 



: : 1 < i < 3 < N} {l, . . .,#)}, >y(N) 



and denote by H y the Wigner matrix whose matrix elements hij follow the v-distribution if </>(i, j) < 7 and 
they follow the w-distribution otherwise; in particular = H and = Hy/m. 

Consider the telescopic sum of differences of expectations (we present only one resolvent for simplicity of 
the presentation): 

e(— Tr — 4 Ve f — Tr — i ^ (4.4) 

\ AT UillA \ AT U lA _ 7 V / 



TV H( w )-z) \N Hi v 



E 

7=1 



E I ^-Tr — - — ] -E [ 4-Tr- 



N H y -zJ \N Hy^i-z 



Let denote the matrix whose matrix elements are zero everywhere except at the position, where 
it is 1, i.e., E^p = 5ikSj£. Fix a 7 > 1 and let be determined by <t>(i,j) = 7- We will compare 
with H 7 . Note that these two matrices differ only in the and matrix elements and they can be 
written as 

# 7 -i = Q + ~j=V, V:= vtjEM + VfiEM , Vji := v t] , 
v 



H y = Q + -L W, W := Wij EM + Wji E^ , 
x/N 



with a matrix Q that has zero matrix element at the and (j,i) positions. 
By the resolvent expansion, 

57-1 =R-N- 1 / 2 RVR+... + N- 2 (RV) 4 R-N- 5/2 (RV) 5 S, R:=—^—, := , 

Q — z H 7 -i — z 

and a similar expression holds for the resolvent 5* 7 of by H-y. From the local semicircle law for individual 
matrix elements (3.7), the matrix elements of all Green functions R, S 7 -i, S 7 are bounded by CN E for any 
e > 0. By assumption (4.1), the difference between the expectation of matrix elements of S-y-i and S7 
is of order ]\[- 2 - s + Ce _ Since the number of steps, j(N) is of order TV 2 , the difference in (4.4) is of order 

N 2 N -2-S+Ce < 1; and thig 

proves Theorem 4.1 for a single resolvent. It is very simple to turn this heuristic 
argument into a rigorous proof and to generalize it to the product of several resolvents. The real difficulty 
is the input that the local semicircle law holds for a general class of Wigner matrices. 



5 Universality for Wigner matrices: putting it together 

In this short section we put the previous information together to prove Theorem 1.1. We first focus on the 
case when 6jv is independent of N. Recall that Theorem 2.1 states that the correlation functions of the 
Gaussian divisible ensemble, 

H t = e~ t/2 H + (1 - e" 4 ) 1 / 2 U, (5.1) 
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where Hq is the initial Wigner matrix and U is an independent standard GUE (or GOE) matrix, are given 
by the corresponding GUE (or GOE) for t > N~ 2a+£ provided that the a-priori estimate (2.5) holds for 
the solution f t of the forward equation (2.2) with some exponent a > 0. Since the rigidity of eigenvalues, 
Corollary 3.2, holds uniformly for all Wigner matrices, we have proved (2.5) for a = 1/2 — e with any e > 0. 
From the evolution of the OU process (2.1) for Vij = N x / 2 hij we have 

|Eu(,-(t) -Eu?-(0)| < Ct = CN- 1+3e (5.2) 

for s = 3, 4 and with the choice of t = N~ 1+3e . Furthermore, E/if^ (t) are independent of t for s = 1, 2 due to 
Mvij(0) = and Evfj(t) = 1. Hence (4.1) is satisfied for the matrix elements of H t and Ho and we can thus 
use Theorem 4.2 to conclude that the correlation functions of H t and Hq are identical at the scale 1/N. Since 
the correlation functions of H t are given by the corresponding Gaussian case, we have proved Theorem 1.1 
under the condition that the probability distribution of the matrix elements decay subexponentially. Finally, 
we need a technical cutoff argument to relax the decay condition which we omit here (see Section 7 in [26] ) . 

The argument for iV-dependent b = b^ in the range &at > N~ 1+ ^, £ > 0, is slightly different. For 
such a small bpf, (2.8) could be established only for relatively large times, t > N~^ 8 . We cannot therefore 
compare Hq with H t directly, since the deviation of the third moments of Vij(0) and Vij(t) in (5.2) would 
not satisfy (4.1). Instead, we construct an auxiliary Wigner matrix Ho such that up to the third moment its 
time evolution Ht under the OU flow (5.1) matches exactly the original matrix Ho and the fourth moments 
are close even for t of order N~^/ s (see Lemma 3.4 of [37]). Theorem 2.1 will then be applied for H t , and 
Theorem 4.1 can be used to compare Ht and Hq. 

We finally discuss the extension of Theorem 1.1 without averaging in E' . For Hermitian matrices, with 
the notations of Theorem 1.1, for any fixed \E\ < 2 we have that 

L a j-A_(p«- P a) ( E+ _^_,..., E+ _?y . o. (5 .3) 

This convergence was first proved in Theorem 1.1 of [27] for matrices with distribution which is Cri-timcs 
diffcrcntiable for some universal constant C. For a general distribution it was stated as Theorem 5 in [71]. 
Although the proof in [71] took a slightly different path, this generalization is an immediate corollary of our 
previous results [35]. Recall our three step approach reviewed in the introduction. If wc substitute Step 
2b with Step 2a, then all our results in the Hermitian case would need no time average. More precisely, 
Proposition 3.1 of [27] asserts that the bulk universality in the Hermitian case holds at a fixed energy for 
the Gaussian convolution matrix Ht with t ~ N~ 1+d . The first four moments of Ht and Hq are sufficiently 
close to apply directly the Green function comparison theorem for correlation functions (Theorem 4.2 in 
this article) . This concludes the bulk universality of the original matrix Hq at a fixed energy, which is the 
Theorem 5 in [71]. In fact, our theory implies the same result for generalized Hermitian matrices (defined 
in Section 8) with finite 4 + e moments. 



6 Beta ensemble: Rigidity estimates 

The general /3-ensemble with a potential V is defined by the probability measure \i = fJ^y (1.7) on N ordered 
real points Ai < ... < \n- We let P„ and denote the probability and the expectation with respect to /x. 
For simplicity of presentation we assume that the potential V is convex, i.e., 

tf:=^mf V"(x)>0, (6.1) 
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the equilibrium density g(s) is supported on a single interval [A, 6]cl and satisfies (1.10) (for the general 
case, see [11]). The Gaussian case corresponds to V(x) = x 2 /2, in which case the equilibrium density is the 
semicircle law, g sc , given by (1.2). Our main result concerning the universality is Theorem 1.2 and similar 
statement holds for the universality of the gap distributions directly. In fact, the proof of Theorem 1.2 goes 
via the gap distribution as we now explain. 

Similarly to (2.4) we again denote by "fk the classical location of the A:-th point w.r.t. the limiting 
equilibrium density g(s), i.e. "fk is defined by 

7fs y. 

g(s)ds = -. (6.2) 

-oo ^ * 

The first step to prove Theorem 1.2 is the following theorem which provides a rigidity estimate on the 
location of each individual point in the bulk almost down to the optimal scale 1/iV. In the following, we will 
denote fx, yfl = N n [x, y]. 

Theorem 6.1 [10, Theorem 3.1} Fix any a,e > and assume that (6.1) holds. Then there are constants 
5, ci, C2 > such that for any N > 1 and k € [aiV, (1 — ct)N~l, 

P^(|A fc - 7 fe| >N- 1+ z) < Cl e~ c - N \ 



The first ingredient to prove Theorem 6.1 is an analysis of the loop equation following Johansson [47] 
and Shcherbina [61]. The equilibrium density g, for a convex potential V, is given by 



g(t) = -r(t)J{t - A)(B - tjl [AtB] (t), (6.3) 

where r is a real function that can be extended to an analytic function in C and r has no zero in R. Denote 
by s(z) := —2r(z)^/ (A — z)(B — z) where the square root is defined such that its asymptotic value is z as 
z — > oo. Recall that the density is the one-point correlation function which is characterized by 

[ dA 1 0(A 1 )^ ) (Ai) = / 0(Ax)d/A^(A), A = (A l! A 2 ,...,A JV ). (6.4) 

JE JS. N 

Let tojv and m be the Stieltjes transforms of the density and the equilibrium density g, respectively. 
Notice that in Section 3 we have used m = rriN to denote the Stieltjes transform of the empirical measure 
(3.2); here m^r denotes the ensemble average of the analogous quantity. 
Define the analytic functions 

and cjv(z) := -^2k^{z) + jj (j^ — ij m' N (z), where k^{z) := var p (^2^=1 z-\ k J ' Here for complex random 
variables X we use the definition that var(X) = E(X 2 ) - E(X) 2 . 

The equation used by Johansson (which can be obtained by a change of variables in (6.4) [47] or by 
integration by parts [61]), is a variation of the loop equation (see, e.g., [41]) used in the physics literature 
and it takes the form 

(mjv — no) 2 + s(mjv — m) + bN = cn- (6-5) 
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Equation (6.5) expresses the difference to at — to in terms of (mjv — m ) 2 , ^jv and cat. In the regime where 
\ffiN — m\ is small, we can neglect the quadratic term. The term is of the same order as \rfiN — m\ and 
is difficult to treat. As observed in [2, 61], for analytic V, this term vanishes when we perform a contour 
integration. So we have roughly the relation 



(mjv - m) ~ ^ var M f ^ 7^^J ' ( 6 - 6 ) 

where we dropped the less important error involving rh' N (z)/N due to the extra l/N factor. In the convex 
setting, the variance can be estimated by the logarithmic Sobolev inequality and we immediately obtain an 
estimate on toat — to. Wc then use the Hclffcr-Sjostrand formula, see (3.9), to estimate the locations of the 
particles. This will provide us with an accuracy of order iV -1 / 2 for E^Afc — 7&. This argument gives only an 
estimate on the expectation of the locations of the particles since we only have information on the averaged 
quantity, toat. Although it is tempting to use this new accuracy information on the particles to estimate the 
variance again in (6.6), the information on the expectation on A& alone is very difficult to use in a bootstrap 
argument. To estimate the variance of a non-trivial function of A& we need high probability estimates on A&. 

The key idea in this section is the observation that the accuracy information on the A's can be used to 
improve the local convexity of the measure /i in the direction involving the differences of A's. To explain this 
idea, we compute the Hessian of the Hamiltonian of /it: 



i<j 

The naive lower bound on V 2 !K is i?, but for a typical A = (Ai, A2, . . . , A at) it is in fact much better in most 
directions. To see this effect, suppose we know |A, — Xj\ < M/N with some M for any i,j € I^ 1 , where 
I k I ■= l k - M -> k + M l- Then for v = (Vk-M, Vk+Ai) with £\ Vj = we have 

v, V 2 5£(A)v) > ^ £ ( Vi - Vj f >C^Y: vl (6.7) 
This improves the convexity of the Hessian to N/M on the hyperplane y"V Vj = 0. Let 

Af 1 n^r 1 E Aj 

denote the block average of the locations of particles and rewrite 

x k -x [ r s] =j:^ [ k MA -^ +i] ) 

as a telescopic sum with an appropriate sequence of Mi = 0, M2, ■ ■ ■■ Wc can now use the improved 
concentration on the hyperplane J^j v j = to the variables aJ. Mj ' — a[ Mj+1 ' to control the fluctuation of 

Afc — AL . Since the fluctuation of X k is very small for small e, we finally arrive at the estimate 

P M (|A fc - E M (A*)| > a) < Ce~ CN2a2 / M . (6.8) 
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From (6.8) we thus have that |A& — E p Afe| < \fM/N with high probability. This improves the starting 
accuracy | A f — Xj\ < M/N for i, j e I^ 1 to |A 4 - Aj| < il/'/iV with some M' < M, provided that we can prove 
that |E M (Ai — Aj)| <C M'/N. But the last inequality involves only expectations and it will follow from the 
analysis of the loop equation (6.5) we just mentioned above. Starting from M ~ N, this procedure can be 
repeated by decreasing M step by step until we get the optimal accuracy, M ~ O(l). The implementation 
of this argument in [10] is somewhat different from this sketch due to various technical issues, but it follows 
the same basic idea. 



7 Beta ensemble: The local equilibrium measure 

Having completed the first step, the rigidity estimate, we now focus on the second step, i.e. on the uniqueness 
of the local Gibbs measure. Let < n < 1/2. Choose q € [k, 1 — k] and set L = [Nq] (the integer part). Fix 
an integer K = N k with k < 1. We will study the local spacing statistics of K consecutive particles 

{A, : j el}, I = I L :=\L + 1,L + K\. 

These particles are typically located near E q determined by the relation 




g(t)dt = q. 



Note that \j L - E q \ < C/N. 

We will distinguish the inside and outside particles by renaming them as 

(Ai, A 2 , . . . , \n) ■= {yi, ■ ■ .Vl,xl+i, ■ ■ ■ ,x l+ k,Vl+k+i, ■ ■ -Vn) G 2 (jv) , (7.1) 

but note that they keep their original indices. The notation refers to the simplex {z : z\ < 22 < ■ • • < 
zjv} in R . In short we will write 

x = (x L+1 ,...,x L+K ), and y = (j/i, y L ,y L+K+1 , y N ), 

all in increasing order, i.e. x £ "E^ K ^ and y € ~E^ N ~ K ) . We will refer to the y's as external points and to the 
x's as internal points. 

We will fix the external points (also called as boundary conditions) and study conditional measures on 
the internal points. We define the local equilibrium measure on x with fixed boundary condition y by 

(7.2) 

Note that for any fixed y <G 'E, t - N ~ K \ the measure /z y is supported on configurations of K points x = {xj}j e j 
located in the interval [y^, vl+k+i]- 

The Hamiltonian 3~C y of the measure /x y (dx) ~ cxp(— N"K y (x))dx is given by 

^y( x ) : =Ef y y^)-f E lo gi^-^i with v y (x) :=v(x)~±Y, l °z\ x ~yj\- ( 7 - 3 ) 



/i y (dx) = /U y (x)dx, /i y (x) := fj,(y, x) 



/x(y,x)dx 
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We now define the set of good boundary configurations with a parameter 5 = S(N) > 



5s = S := { 



IW-7il<*, VjG[JVK/2,L]U[L + Jif + l,JV(l-/ S /2)]}, (7.4) 



where k is a small constant to cutoff points near the spectral edges. Some rather weak additional conditions 
for y near the spectral edges will also be needed, but we will neglect this issue here. 

Let a and \i be two measures of the form (1.7) with potentials W and V and densities g = gw and gy , 
respectively. For our purpose W(x) = x 2 /2, i.e., a is the Gaussian /3-ensemble and Qw(t) — 5^(4 — * 2 )+ 
is the Wigner semicircle law. Let the sequence 7j be the classical locations for /i and the sequence Oj 
be the classical locations for a. Similarly to the construction of the measure /i y , for any positive integer 
V € [1,JV— K\ we can construct the measure ae conditioned that the particles outside are given by the 
classical locations 6j for j ^ \L',L' + K\. More precisely, we define a reference local Gaussian measure erg 
on the set [Ol't&L'+k+i] via the Hamiltonian 



iei' 



I 
N 



■77 > l°g \ x 3 



E 

,36/' 
i<3 



(7.5) 



where I' :— \L' + 1, V + K\. Since L' will not play an active role, we will abuse the notation and set L' = L. 

The measure fi y lives on the interval [yi,, ul+k+i] while the measure ae lives on the interval [9l, &l+k+i] 
and it is difficult to compare them. But after an appropriate translation and dilation, they will live on the 
same interval and from now on we assume that [y^, yL+K+i] = [Ol, &l+k+i]- The parameter K = N k has to 
be sufficiently small since qv and gw are not constant functions and we have to match these two densities 
quite precisely in the whole interval. There are some other subtle issues related to the rescaling, but we 
will neglect them here to concentrate on the main ideas. Our main result is the following theorem which is 
essentially a combination of Proposition 4.2 and Theorem 4.4 from [10]. 

Theorem 7.1 Let < <p < ^. Fix K = N k , 5 = N~ d with d = 1 — tp and k = ^tp. Then for y G 9 we 
have 



ly supported tt 

complicated observables of the form (2.10) 



(7.6) 



iei iei 
as N — > oo for any smooth and compactly supported test function G. A similar formula holds for more 



The basic idea for proving Theorem 7.1 is to use the Dirichlet form inequality (2.33). Although (2.33) 
was stated for an infinite volume measure, it holds for any measure with repulsive logarithmic interactions 
in a finite volume and with the parameter r _1 being the lower bound on the Hessian of the Hamiltonian. In 
our setting, we denote by t^ 1 the lower bound for V 2 ^^, and the Dirichlet form inequality becomes 



iei 



r N e \ 1/2 

< C(-^-D(ny\<T g )j +Ce~ cN ,JSQiy\<re), 



where 



D(ny | a e ) 



2N 



da g 



dae- 



(7. 
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Thus our task is to prove that 
By definition, 



^ ^'^ 40. (7.9) 
K 

J L + l<j<L+K 



where Zj is defined as 



P v > {x) -l y _J ^'(3.) + A y _J_. (7.io) 



2 v ^ at j,. „ 2 v 31 N 

k<L J k<L 

k>L+K k>L+K 



Using the equilibrium relation (1.10) between the potentials V, W and the densities gy, gw, we have 

z . _g f sviv) dy P y 1 p f gw ^ d y \ 13 V 1 
3 Jm Xj-y N hi x o ~V k Jm x j -y N hi x i ~ 0k ' 

k>L+K k>L-\-K 



Hence Zj is the sum of the error terms, 



A r .= 



d y-^ E t - "— an) 



Qv(y) 

Xj-y~ a N {-< Xj-y k 



'y^[yL,VL+K+i] x i y JV k <L x 3 

Bj := [ VL+K+1 ej^MM dy , (7 . 12 ) 

J y L X j y 



and there is a term similar to Aj with yj replaced by 9j and gy replaced by g\y- 

With our convention, the total numbers of particles in the interval [ul+k+i, Vl] are equal and thus 

Vl + k+i rVL+K+i 

Qv(y)dy = / g w (y)dy. 

Since the densities pv and pw arc C 1 functions away from the endpoints A and B and Pl+k+i ~ yL is 
small, \pv — pw\ is small in the interval [yL+K+i,yL,] and thus Bj is small. For estimating A,-, we can 
replace the integral dy by J2k<L x -i k w i* n negligible errors, at least for j's away from the 

edges, j G [i + iV £ , L + Jv - iV £ ]]. Thus 

i^i ^§1 E T *< ^ : = — — . ( 7 - 13 ) 

n i tzL x j -yk xj - 7fc 

k>L + K 



and T- can be estimated by the assumption |j/fc — 7fc| < 5 from y G S- The same argument works if j is 
close to the edge, but k is away from the edges, i.e. k < L — N £ or k > L + K + N £ . The edge terms, Tj 
for \j — k\ < N E , are difficult to estimate due to the singularity in the denominator and the event that many 
2/fc's with k < L may pile up near y^. To resolve this difficulty, we show that the averaged local statistics of 
the measure p y are insensitive to the change of the boundary conditions for y near the edges. This can be 
achieved by the simple inequality 

I^E/ G(N( Xl -x t+1 ))[dp y -dp y ]\<C j |d/v -dny\ <C^S{p r \p y ) (7.14) 
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for any two boundary conditions y and y'. Although we still have to estimate the entropy that includes a 
logarithmic singularity, this can be done much more easily. Therefore, we can replace the boundary condition 
yk with y' k = 6k for \j — k\ < N £ and then the most singular edge terms in (7.10) cancel out. 

We note that we can perform this replacement only for a small number of index pairs (j,k), since 
estimating the gap distribution by the total entropy, as noted in (2.36) in Section 2, is not as efficient as the 
estimate using the Dirichlet form per particle. Thus we can afford to use this argument only for the edge 
terms, \j — k\ < N £ . For all other index pairs (j,k) we still have to estimate Tj by exploiting that y is a 
good configuration, i.e. yk — 7fe is small. 

Unfortunately, even with the optimal accuracy 6 ~ N~ 1+£ in (7.4) as an input, the relation (7.9) still 
cannot be satisfied for any choice of N C£ < K < A fl_ce . We do not know whether this is due to our 
handling of the edge terms or some other intrinsic reasons. To understand why this might occur, we remark 
that while the edge terms become a smaller percentage of the total terms in (7.14) as K gets bigger, the 
relaxation time to equilibrium for ag, determined by the convexity of !Kg, increases at the same time. At 
the end of our calculation, there is no good regime for the choice of K. Fortunately, this can be resolved by 
using the idea of the local relaxation measure [34], i.e., we add a quadratic term w^(xj — jj) 2 to the measure 
fiy and 7p^(xj — 0j) 2 to the measure og. With these ideas, we can complete the proof of Theorem 7.1. 

8 More general classes of random matrices 

All our results concerning Wigner matrices hold for a broader class of ensembles where the matrix elements 
hij still have mean zero, Khij — 0, but their variances are allowed to vary. More precisely, we assume that 
the variances of- :—E\hij\ 2 satisfy the normalization condition 

N 

£4 = 1, i = l,2,...,N, (8.1) 

i=i 

and they are comparable, i.e. 

< C mf < Nal <C sup <oo, i, j = l,2,..., N, (8.2) 

for some fixed positive constants Ci n f and C sup . These ensembles are called generalized Wigner ensembles. 
In the special case afj — 1/N, we recover the original Wigner ensemble. All our results concerning the bulk 
universality, derealization of eigenvectors and local semicircle laws hold for generalized Wigner matrices as 
well. 

There is another important class of random matrices, the band matrices, which are characterized by the 
property that crfj is a function of \i — j\ on scale W, which is called the bandwidth, i.e., 

4=^/(^), (8-3) 

where /:!£—» W.+ is a bounded nonnegative symmetric function with J f = 1 and [i — j]^ = i — j mod N. 
For this class, the local semicircle law is known to hold at least down to scale r] ~ W~ x and all eigenvectors 
are delocalized at least on scale W. Moreover, most eigenvectors are known to be delocalized on a much 
larger scale W 7 ^ 6 [23, 24], but smaller than W 8 [58], and it is expected that the correct localization length 
is W 2 . So far no bulk universality result is known. 
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The significance of the random band matrices stems from the fact that they interpolate between discrete 
random Schrodinger operators with short range hoppings (Anderson model) and the Wigner matrices. In 
particular, random matrix spectral statistics are expected to hold in the presumed derealization regime of 
the Anderson model in three or higher dimensions. For more details on this exciting connection, see [66]. 

Finally we mention the ensemble of sample covariance matrices that play a fundamental role in statistics. 
These are matrices of the form H = A* A where A is an M x N matrix with independent identically 
distributed entries. The semicircle law is replaced with the Marchcnko-Pastur law, but most results listed 
in this review remain valid. For more details, see [34, 69]. 

9 Edge universality 

Denote by Ajv is the largest eigenvalue of a generalized Wigner matrix. The probability distribution functions 
of A at for the classical Gaussian ensembles are identified by Tracy and Widom [72, 73] to be 

lim P(7V 2/3 (Aat - 2) < s) = F«(s), (9.1) 

where the functions Fp(s) can be computed in terms of Painleve equations and (3 = 1,2,4 corresponds to 
the standard classical ensembles. The distribution of Ajv is believed to be universal and independent of the 
Gaussian structure. 

The local semicircle law, Theorem 3.1, combined with a modification of the Green function comparison 
theorem, Theorem 4.1, implies the following version of universality of the extreme eigenvalues. Although it 
holds for correlation functions of finite number of eigenvalues, for simplicity we state it for the largest one 
and for the case of symmetric matrices only. 

Theorem 9.1 (Universality of the largest eigenvalue) [38, Theorem 2.4] Suppose that we have two 
N x N symmetric generalized Wigner matrices, H^ v ' and H^ w ' , with matrix elements hij given by the random 
variables N~ 1 / 2 Vij and N~ x / 2 Wij , respectively, with Vij and Wij satisfying the uniform subexponential decay 
condition (3.3). Let P v and P w denote the probability and let E v and E w denote the expectation with respect 
to these collections of random variables. Suppose that 

Wv%=Ww%. (9.2) 

Then there is an e > depending on $ in (3.3) such that for any real parameter s (may depend on N) we 
have 

P v (A 2/3 (Aat - 2) < s- N~ e ) - N~ e < P w (AT 2 / 3 (Aat - 2) < s) < ¥ V {N 2/3 (X N -2)<s + N' 6 ) + N~ e (9.3) 
for N > No sufficiently large, where Nq is independent of s. 

Note that although Theorem 9.1 states that the edge distribution is universal for a fixed choice of the 
variances afp it does not identify this distribution. In particular, we do not know if it coincides with the 
Tracy- Widom distribution apart from the Hcrmitian case, when the method of [47] can be applied. The 
extension of Theorem 9.1 to eigenvectors was recently obtained by Knowles and Yin [48], i.e., under the 
assumption (9.2), the distributions for the largest eigenvectors coincide. Similar results hold for the joint 
distribution of eigenvectors near the edges. 
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10 Erdos-Renyi matrix 



The Erdos-Renyi matrix is the adjacency matrix of the Erdos-Renyi random graph [39, 40]. Its entries are 
independent (up to the constraint that the matrix be symmetric) and are equal to 1 with probability p and 
with probability 1 — p. We rescale the matrix in such a way that its bulk eigenvalues typically lie in an 
interval of size of order one. Thus we have a symmetric N x N matrix A = (ay) whose entries a%j are 



independent (up to the symmetry constraint Oi 



j 1 



and each element is distributed according to 



with probability jj 

2 

with probability 1 — 21 



pN. 



(10.1) 



Here 7 := (1 — q 2 /N) x l 2 is a scaling introduced for convenience to compare with Wigner matrices. We also 
assume that q = ^pN > (log jV) cl ° slog N , in particular the Erdos-Renyi graph is connected. 

Theorem 10.1 (Local semicircle law for Erdos-Renyi matrix) [25, Theorem 2.9] Let m(z) denote 
the Stieltjes transform of the empirical eigenvalue distribution of the matrix A and let G(z) = (A — z)^ 1 be 
its resolvent. Assume that the spectral parameter z — E + if] satisfies \E\ < 5 and (log N) L N^ 1 < rj < 3 
with a sufficiently large constant L. Then we have the following two estimates: 

(i) The Stieltjes transform of the empirical eigenvalue distribution of A satisfies 



P{\m(z) - m sc (z)\ > (logTV)' 7 [— + -]} <Cexp[-c(logiV) c ]. 
(ii) The individual matrix elements of the Green function satisfy that 



max\Gij(z) - 5 io m sc (z)\ > (log AT) 



c 



3m m sc (z) 



Nn 



1 

Nn 



< C*exp[-c(logAO c 



(10.2) 



(10.3) 



Compared with the local semicircle law, Theorem 3.1, there is an extra factor \ jq appearing in the error 
estimates of Theorem 10.1. This extra error term affects the rigidity estimate of eigenvalues, and (3.2) 
becomes 



\H - Til 



I < (l0giV) C Ar 2 /3j-i/3 + q - 



j < N/2, 



(10.4) 



for q = \fp~N TV 1 / 3 . We also have an estimate for the regime q < TV 1 / 3 but that is weaker. Moreover, 
under the assumption q S> N 1 ^ 3 , both bulk and edge universality are proved (see Theorem 2.5 and 2.7 in 
[25] ) . It is well- known that the largest eigenvalue Xn of A satisfies 



A 



N 



7T 



— +0(1): 
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(10.5) 



hence it is located far away from the bulk spectrum. Therefore the edge universality for Erdos-Renyi 
matrices refers to the second largest eigenvalue instead of the largest one. Since the matrix elements of 
A have nonzero means, both the edge and bulk universality require substantial new ideas in addition to 
those we have sketched. We refer the interested readers to the original papers [25, 26] for more detailed 
explanations. 
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11 Historical Remarks 



Finally we summarize the recent history related to the universality of local eigenvalue statistics of Wigncr 
matrices. The three-step approach was first introduced in [27] in the context of Hcrmitian Wigncr matrices 
and it led to the first proof of the Wigner-Dyson-Gaudin-Mehta conjecture for Hcrmitian Wigner matrices. 
It works whenever the distributions of the matrix elements are smooth. This approach was followed by all 
later works on the bulk universalities. We now review the history of Steps 1-3 separately and we start with 
the history of Step 1, the local semicircle law. 

The semicircle law was proved by Wigner for energy windows of order one. Various improvements were 
made to shrink the spectral windows; in particular, results down to scale A -1 / 2 were obtained by [4] and 
[44]. The result at the optimal scale, TV -1 , referred to as the local semicircle law, was established for Wigner 
matrices in a series of papers [30, 31, 32]. The method was based on a self-consistent equation for the Stieltjes 
transform of the eigenvalues, m(z), and the continuity in the imaginary part of the spectral parameter z. As a 
by-product, the optimal eigenvector derealization estimate was proved. In order to deal with the generalized 
Wigncr matrices, we needed to consider the self-consistent equation of Gij(z), the matrix elements of the 
Green function, since there is no closed equation for m(z) = N~ lr Fr G(z) [36, 37]. In particular, this method 
implied the optimal rigidity estimate of eigenvalues in the bulk in [37] and up to the edges in [38]. The 
estimate on Gu provided a simple alternative proof of the eigenvector derealization estimate. The extension 
of the local semicircle law to the Erdos-Renyi matrices was recently made in [25]. 

We now review the history of Step 2. Recall that Hermitian Gaussian divisible ensembles are matrices 
of the form e^Ho + (1 - e^f^U, where U is the GUE and H is a Wigner ensemble. The universality 
of this ensemble for a large class of Hq and for parameters t of order one was proved by Johansson [46] . It 
was extended to complex sample covariancc matrices by Ben Arous and Peche [6]. There were two major 
restrictions of this method: 1. The Gaussian component was fairly large, it was required to be of order one 
independent of A; 2. The method relies on an explicit formula by Brezin-Hikami [12] for the correlation 
functions of eigenvalues. This formula originates in the Harish-Chandra-Itzykson-Zuber integral [45] and it 
is valid only for Gaussian divisible ensembles with unitary invariant Gaussian component. The size of the 
Gaussian component was reduced to A^ 1+e in [27] by using an improved formula for correlation functions 
and the local semicircle law from [30, 31, 32]. 

To eliminate the usage of an explicit formula, a conceptual approach for Step 2 via the local ergodicity of 
Dyson Brownian motion was initiated in [33]. In this paper, the first version of the local relaxation flow was 
introduced, but it was rather complicated. In [34] we found a much simpler way to enhance the convexity of 
the Dyson Brownian motion and we proved a general theorem for local ergodicity of DBM and related flow, 
i.e., Theorem 2.1. This theorem applies to all classical ensembles, i.e., real and complex Wigner matrices, 
real and complex sample covariance matrices and quaternion Wigner matrices. The local relaxation flow in 
the simple form (2.25) first appeared in [34]. The relaxation time to local equilibrium proved in these two 
papers was not optimal; the optimal relaxation time, conjectured by Dyson, was obtained later in [38]. 

The third and final step is to approximate the local eigenvalue distribution of a general Wigner matrix 
by that of a Gaussian divisible one. The first approximation result was obtained via the reversal heat flow 
in [27] which required some smoothness of the distribution of matrix elements. Shortly after, Tao and Vu 
[67], proved a comparison theorem with a four moment matching condition. Instead of using a Gaussian 
divisible ensemble with a small (A~ 1+e ) Gaussian component, they relied on Johansson's result [46] to 
provide Hcrmitian Gaussian divisible ensembles for comparison. This proved the universality of Hermitian 
Wigner matrices, provided that the distributions of matrix elements have vanishing third moment and are 
supported on at least three points. These conditions were removed in [28] by combining the arguments of 
[27] and [67]. 
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Due to the lack of a Brezin-Hikami type formula for the symmetric matrices, there was no extension of 
Johansson's result [46] to this case and the universality for symmetric Wigner ensembles was much more 
difficult to prove. However, the result of [67] implies that the local eigenvalue statistics of symmetric Wigner 
matrices and GOE are the same, but under the restriction that the first four moments of the matrix elements 
exactly match those of GOE. The resolution of the Wigncr-Dyson-Gaudin-Mehta conjecture for symmetric 
matrices, i.e., Theorem 2.1 for real symmetric matrices, was obtained in [33, 36]. In these papers, two 
new ideas were introduced: the local relaxation flow [33] and the Green function comparison theorem [36]. 
Starting from the paper [36], the variances were allowed to vary and the universality was extended to 
generalized Wigner matrices. The real Bernoulli random matrices required a more refined argument [37]. 
Finally, the technical condition assumed in all these papers, i.e., that the probability distributions of the 
matrix elements decay subexponentially, was reduced to the (4 + e)-moment assumption (1.8) by using the 
universality of Erdos-Renyi matrices [26] . 

The Green function comparison theorem, Theorem 4.1, uses the same four moment conditions which 
appeared earlier in [67], but it compares matrix elements of Green functions at a fixed energy and not just 
traces of Green functions which carry information on eigenvalues near a fixed energy The result of [67], on 
the other hand, concerns individual eigenvalues with fixed labels. Both proofs used the local semicircle law 
and Lindebcrg's idea (introduced in his proof of the central limit theorem). Lindebcrg's idea in the context of 
random matrices appeared earlier in a proof of the Wigner semicircle law by Chatterjee [13]. The approach 
[67] requires additional difficult estimates due to singularities from neighboring eigenvalues, but the Green 
function comparison theorem follows directly from the local semicircle law in Step 1, i.e., Theorem 3.1, via 
standard resolvent expansions. The difficulties associated with the singularities of eigenvalue resonances 
are completely absent in the Green function comparison theorem. Finally, we mention that Green function 
comparison can also yield comparison of eigenvalues with fixed labels, see the recent work by Knowles and 
Yin [48]. 

The edge universality for Wigner matrices was first proved via the moment method by Soshnikov [65] 
(see also the earlier work [62]) for Hermitian and symmetric ensembles with symmetric distributions. By 
combining the moment method and Chebyshev polynomials, Sodin [63, 64] proved edge universality of 
certain band matrices and some special class of sparse matrices with symmetric distribution. The symmetry 
assumption was partially removed in [59, 60]. The edge universality without any symmetry assumption 
was proved in [68] under the condition that the distribution of matrix element is subexponential decay and 
the first three moments match those of a Gaussian distribution. The subexponential decay condition is not 
optimal for edge universality, in fact the finiteness of the fourth moment was conjectured to be sufficient. For 
Gaussian divisible Hermitian ensembles this was proved in [47]. This is optimal, since on the other hand, the 
result by Auffinger, Ben Arous and Peche [3] showed that the distribution of the largest eigenvalues converges 
to a Poisson process if the entries have at most 4 — e moments. For Wigner matrices with arbitrary symmetry 
class, the edge universality was proved under the sole assumption that the matrix entries have 12+er moments 
[26] . Finally, we mention that extension of universality to eigenvectors near the edge was obtained by Knowles 
and Yin [48] under the two moment matching condition and with four moment matching condition in [70]. 

Although we have focused only on Wigner matrices and /3-enscmblcs, the ideas summarized in this review 
should be applicable to a wide class of matrix ensembles. We have already mentioned some natural open 
questions related to possible improvements of our results. These concern removing some technical conditions 
such as (i) the restriction q TV 1 / 3 in the bulk universality of the Erdos-Renyi matrix; (ii) the 12 + e moment 
condition for edge universality. A more ambitious goal would be to prove universality for systems with some 
spatial structure such as band matrices or related models that may open up a path towards universality for 
random Schrodinger operators and other realistic models of quantum chaos. 
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